cdr loop
Generative Co-Design of Antibody Sequences and Structures via Black-Box Guidance in a Shared Latent Space
Yao, Yinghua, Pan, Yuangang, Chen, Xixian
Advancements in deep generative models have enabled the joint modeling of antibody sequence and structure, given the antigen-antibody complex as context. However, existing approaches for optimizing complementarity-determining regions (CDRs) to improve developability properties operate in the raw data space, leading to excessively costly evaluations due to the inefficient search process. To address this, we propose LatEnt blAck-box Design (LEAD), a sequence-structure co-design framework that optimizes both sequence and structure within their shared latent space. Optimizing shared latent codes can not only break through the limitations of existing methods, but also ensure synchronization of different modality designs. Particularly, we design a black-box guidance strategy to accommodate real-world scenarios where many property evaluators are non-differentiable. Experimental results demonstrate that our LEAD achieves superior optimization performance for both single and multi-property objectives. Notably, LEAD reduces query consumption by a half while surpassing baseline methods in property optimization. The code is available at https://github.com/EvaFlower/
Fast and Accurate Antibody Sequence Design via Structure Retrieval
Zhang, Xingyi, Xie, Kun, Huang, Ningqiao, Liu, Wei, Zhao, Peilin, Wang, Sibo, Zhao, Kangfei, Jiang, Biaobin
A BSTRACT Recent advancements in protein design have leveraged diffusion models to generate structural scaffolds, followed by a process known as protein inverse folding, which involves sequence inference on these scaffolds. However, these methodologies face significant challenges when applied to hyper-variable structures such as antibody Complementarity-Determining Regions (CDRs), where sequence inference frequently results in non-functional sequences due to hallucinations. Distinguished from prevailing protein inverse folding approaches, this paper introduces IgSeek, a novel structure-retrieval framework that infers CDR sequences by retrieving similar structures from a natural antibody database. Specifically, IgSeek employs a simple yet effective multi-channel equivariant graph neural network to generate high-quality geometric representations of CDR backbone structures. Subsequently, it aligns sequences of structurally similar CDRs and utilizes structurally conserved sequence motifs to enhance inference accuracy. Our experiments demonstrate that IgSeek not only proves to be highly efficient in structural retrieval but also outperforms state-of-the-art approaches in sequence recovery for both antibodies and T -Cell Receptors, offering a new retrieval-based perspective for therapeutic protein design. 1 M AIN Antibodies, known for their high specificity and affinity, have emerged as pivotal therapeutic agents in the treatment of complex diseases, including cancer Adams & Weiner (2005), autoimmune disorders Feldmann & Maini (2003), and infectious diseases Abraham (2020). In 2023, the global best-selling drug was Keytruda, a cancer treatment antibody, with sales reaching $25 billion, surpassing Humira, another antibody used for treating rheumatoid arthritis, which had dominated the market for the past decade (Dunleavy, 2024). Traditionally, the discovery of antibodies has predominantly relied on immunizing animals with antigens V an Wauwe et al. (1980) or employing various display techniques such as phage MacCallum et al. (1996) and yeast displays Chao et al. (2006). However, these approaches face significant challenges when dealing with structurally intricate proteins, which are difficult to express in a soluble and functional form. Additionally, even when numerous candidate antibodies are generated through these techniques, they may not necessarily bind to the desired domain or exhibit therapeutic efficacy.
AntibodyFlow: Normalizing Flow Model for Designing Antibody Complementarity-Determining Regions
Xu, Bohao, Wang, Yanbo, Chen, Wenyu, Shan, Shimin
Therapeutic antibodies have been extensively studied in drug discovery and development in the past decades. Antibodies are specialized protective proteins that bind to antigens in a lock-to-key manner. The binding strength/affinity between an antibody and a specific antigen is heavily determined by the complementarity-determining regions (CDRs) on the antibodies. Existing machine learning methods cast in silico development of CDRs as either sequence or 3D graph (with a single chain) generation tasks and have achieved initial success. However, with CDR loops having specific geometry shapes, learning the 3D geometric structures of CDRs remains a challenge. To address this issue, we propose AntibodyFlow, a 3D flow model to design antibody CDR loops. Specifically, AntibodyFlow first constructs the distance matrix, then predicts amino acids conditioned on the distance matrix. Also, AntibodyFlow conducts constraint learning and constrained generation to ensure valid 3D structures. Experimental results indicate that AntibodyFlow outperforms the best baseline consistently with up to 16.0% relative improvement in validity rate and 24.3% relative reduction in geometric graph level error (root mean square deviation, RMSD).
ABodyBuilder3: Improved and scalable antibody structure predictions
Kenlay, Henry, Dreyer, Frรฉdรฉric A., Cutting, Daniel, Nissley, Daniel, Deane, Charlotte M.
Accurate prediction of antibody structure is a central task in the design and development of monoclonal antibodies, notably to understand both their developability and their binding properties. In this article, we introduce ABodyBuilder3, an improved and scalable antibody structure prediction model based on ImmuneBuilder. We achieve a new state-of-the-art accuracy in the modelling of CDR loops by leveraging language model embeddings, and show how predicted structures can be further improved through careful relaxation strategies. Finally, we incorporate a predicted Local Distance Difference Test into the model output to allow for a more accurate estimation of uncertainties.
De novo antibody design with SE(3) diffusion
Cutting, Daniel, Dreyer, Frรฉdรฉric A., Errington, David, Schneider, Constantin, Deane, Charlotte M.
We introduce IgDiff, an antibody variable domain diffusion model based on a general protein backbone diffusion framework which was extended to handle multiple chains. Assessing the designability and novelty of the structures generated with our model, we find that IgDiff produces highly designable antibodies that can contain novel binding regions. The backbone dihedral angles of sampled structures show good agreement with a reference antibody distribution. We verify these designed antibodies experimentally and find that all express with high yield. Finally, we compare our model with a state-of-the-art generative backbone diffusion model on a range of antibody design tasks, such as the design of the complementarity determining regions or the pairing of a light chain to an existing heavy chain, and show improved properties and designability.
Inverse folding for antibody sequence design using deep learning
Dreyer, Frรฉdรฉric A., Cutting, Daniel, Schneider, Constantin, Kenlay, Henry, Deane, Charlotte M.
We consider the problem of antibody sequence design given 3D structural information. Building on previous work, we propose a fine-tuned inverse folding model that is specifically optimised for antibody structures and outperforms generic protein models on sequence recovery and structure robustness when applied on antibodies, with notable improvement on the hypervariable CDR-H3 loop. We study the canonical conformations of complementarity-determining regions and find improved encoding of these loops into known clusters. Finally, we consider the applications of our model to drug discovery and binder design and evaluate the quality of proposed sequences using physics-based methods.
Multi-Task Learning with Loop Specific Attention for CDR Structure Prediction
Giovanoudi, Eleni, Rafailidis, Dimitrios
The Complementarity Determining Region (CDR) structure prediction of loops in antibody engineering has gained a lot of attraction by researchers. When designing antibodies, a main challenge is to predict the CDR structure of the H3 loop. Compared with the other CDR loops, that is the H1 and H2 loops, the CDR structure of the H3 loop is more challenging due to its varying length and flexible structure. In this paper, we propose a Multi-task learning model with Loop Specific Attention, namely MLSA. In particular, to the best of our knowledge we are the first to jointly learn the three CDR loops, via a novel multi-task learning strategy. In addition, to account for the structural and functional similarities and differences of the three CDR loops, we propose a loop specific attention mechanism to control the influence of each CDR loop on the training of MLSA. Our experimental evaluation on widely used benchmark data shows that the proposed MLSA method significantly reduces the prediction error of the CDR structure of the H3 loop, by at least 19%, when compared with other baseline strategies. Finally, for reproduction purposes we make the implementation of MLSA publicly available at https://anonymous.4open.science/r/MLSA-2442/.